- 5 easy ways to transfer photos from your Android device to your Windows PC
- How to get Google's new Pixel 9a for free
- Just installed iOS 18.4? Changing these 3 features made my iPhone much better to use
- 7 strategic insights business and IT leaders need for AI transformation in 2025
- The most underrated robot vacuum I've ever tested is now 60% off
New MLCommons benchmarks to test AI infrastructure performance

The latest release also broadens its scope beyond chatbot benchmarks. A new graph neural network (GNN) test targets datacenter-class hardware and is designed for workloads like fraud detection, recommendation engines, and knowledge graphs. It uses the RGAT model based on a graph dataset containing over 547 million nodes and 5.8 billion edges.
Judging performance
Analysts suggest that these benchmarks will make it easier to judge the performance of various hardware chips and clusters based on documented models.
“As every chipmaker seeks to prove that its hardware is good enough to support AI, we now have a standard benchmark that shows the quality of question support, math, and coding skills associated with hardware,” said Hyoun Park, CEO and Chief Analyst at Amalgam Insights.
Chipmakers can now compete not just on traditional speeds and feeds, but in mathematical skill and informational accuracy. This benchmark provides a rare opportunity to add new performance standards on cross-vendor hardware, Park added.
“The latency in terms of how quickly tokens are delivered and the time for the user to see the response is the deciding factor,” said Neil Shah, partner and co-founder at Counterpoint Research. “This is where players such as NVIDIA, AMD, and Intel have to get the software right to help developers optimize the models and bring out the best compute performance.”
Benchmarking and buying decisions
Independent benchmarks like those from MLCommons play a key role in helping buyers evaluate system performance, but relying on them alone may not provide the full picture.